Generating Prosodic Structure for Synthesis o f Swedish Intonation
نویسنده
چکیده
This article presents an outline of the prosodic constituent structure which will be incorporated in a linguistic preprocessor forming part of a text-to-speech system for generation of intonation in Swedish restricted texts. INTRODUCTION One of the goals of current research in text-to-speech systems is to improve the quality of intonation by developing algorithms for preprocessing texts in order to extract grammatical and discourse information necessary for the generation of appropriate prosodic patterns. In previous publications, we have reported on the work that we have done developing a preprocessor which tracks coreferential relations using lexicalsemantic and morphological information to find referential identity between content words in restricted texts dealing with the stock-market (Horne & Johansson 1991, 1993, Horne et al. 1993a,b). This information is important in order to predict the location of the final focal accent in an utterance. PROSODIC STRUCTURE AND PHRASING Our current efforts are being directed towards the development of an algorithm which will allow further preprocessing of our restricted texts with the goal of using the information on coreferentiality obtained from the referent tracking algorithm together with further information on lexical category designation to group words together into a hierarchy of prosodic constituents such as those discussed in Bruce & Granström (1993). Information on prosodic structure is needed in order to better predict the location as well as the particular form of tone accents associated with utterance-internal prosodic boundaries. Minimal Parsing Following an approach similar to Bachenko & Fitzpatrick (1990), Quené & Kager (1993) and inspired by concepts within prosodic phonology (e.g. Nespor & Vogel 1986), we are attempting to determine how one, using a minimal amount of parsing, can obtain enough information to construct a hierarchical prosodic structure for each sentence in a text. Unlike other researchers, however, we are also using contextual information such as coreference in our approach to generating prosodic structure. Prosodic Constituents At least three levels of prosodic structure are required for Swedish in order to model all the prosodic information observed in our data. The smallest of these is the Prosodic Word which we will define as corresponding to a content word and any following function words up to the next content word within a given clause. At the beginning of a clause, the Prosodic Word can also begin with one or more function words. The Prosodic Word is characterized by a word accent and potentially a focal accent (Accent 1= HL*(H ̄L ̄), Accent 2 = H*L(H ̄L ̄) (We use H ̄ and L ̄ to represent respectively a focal high and the low tone accent following a focal high in order to distinguish them from the H and L associated with the word accents.). It is also marked by a boundary tone which is realized by a final rise in the case where the content word is not focussed (i.e. contextually given) (H#) or a fall when the content word is focussed (L#). This L# can be thought of as a potential low Prosodic Phrase boundary, i.e. given the proper contextual environment including sufficient duration, the L can be realized low enough to be interpreted as a L% boundary (cf. Bruce et al. 1993 who present experimental evidence to show that increasing the size of a Fo fall after a focal H can lead speakers to perceive a phrase boundary). The H# in its turn can be thought of as a potential H% boundary, e.g. a ‘continuation rise’ associated with nonfinality. Thus a Prosodic Phrase boundary always correlates with a Prosodic Word boundary but not vice versa. These boundary tones, we claim, play an important role in creating the transitions between consecutive Prosodic Words in a larger Prosodic Phrase. They are also points for potential pauses, e.g. before focussed content words (see Gårding 1967, Strangert 1993). The unit does not necessarily correspond to a syntactic constituent as the example in (1) illustrates (‘–’ represents the boundary between Prosodic Words). This type of ‘nonsyntactic’ grouping is perhaps more characteristic of well-planned read texts or spontaneous speech than of non well-planned texts read e.g. by a non-expert/nonprofessional. It can be characterized as more rhythmically-based than a grouping adhering strictly to syntactic phrase boundaries since it begins with a lexical word which has predominantly left-edge stress. We realize that this definition of the Prosodic Word is not the only possible one. However, it corresponds to the most common type of grouping for the speaker whose speech we are modelling and we have therefore decided to use it as a working definition for purposes of algorithm development. (1) Kurserna på – Stockholmsbörsen – fortsätter att – falla. Rates(det) on – Stockholm Stock Exchange(det) – continue to – fall ‘Rates on Stockholm’s Stock Exchange continue to fall’ Figure 1 illustrates the prosodic structure of (1) produced by the female speaker whose prosody we are modelling. She is an ‘expert’ speaker, i.e. she has detailed knowledge of the domain she is talking about (stock-market) and the well-planned impression her speech gives probably results both from this fact and from her long experience as the principal reader of stock-market reports on Radio Sweden (she retired in 1992).
منابع مشابه
From Prosodic Structure to Intonation Contours
The design of a set of rules for generating Swedish intonation in a text-to-speech system containing a linguistic preprocessor is presented.
متن کاملProsodic models and speech synthesis: towards the common ground
Prosodic models have been extensively applied in speech synthesis. However, the necessity of synthesizing prosody has as yet not resulted in a generally agreed upon approach to prosodic modeling. This statement holds for the assignment of segmental durations as well as for generating F0 curves, the acoustic correlate of intonation contours. This paper concentrates on the use and usability of in...
متن کاملA Metrical Model of Rhythm and Intonation for French Text-to-speech Synthesis
This paper presents the prosodic component of a French text-to-speech synthesis system based on a metrical model of rhythm and intonation in which the prosodic well-formedness of utterances is governed by a set of rhythmic and morphosyntactic constraints. We first set out the theoretic basis of the generation of prosodic levels that correspond to the metrical and tonal structure of utterances. ...
متن کاملAcquisition process of L2 Japanese intonation by Swedish learners - Interlanguage or prosodic transfer?
This study examines the acquisition process of L2 Japanese intonation by Swedish learners at intermediate and advanced levels. Regarding the realization of L2 intonation as ‘interlanguage’, it focuses on the acquisition process of various parameters and their phonetic realizations that are relevant in determining Japanese intonation. The parameters can be phonological, syntactic, and discourse ...
متن کاملProsodic phrasing unique to the acquisition of L2 intonation - an analysis of L2 Japanese intonation by L1 Swedish learners
This paper examines the prosodic organization of L2 Japanese produced by L1 Swedish at the beginner level. Japanese and Swedish have been well studied for their prosodic structures and some well-defined prosodic phrases have been proposed. However, these existing prosodic phrases are found to be inadequate in analyzing L2 intonation seen as interlanguage. Instead, it consists of some unique phr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994